The purpose of this notebook is to get an overview of the data included in the dataset immoscout_cleaned_lat_lon_fixed_v9.csv
The dataset contains 13378 rows and 108 columns.
We've identified that the dataset contains data on the following features:
| Feature | Columns |
| ------- | ------- |
| Availability | Availability, Availability_merged, Disponibilità, Disponibilité, Verfügbarkeit, detail_responsive#available_from |
| Address | Commune, Comune, Gemeinde, Municipality, Municipality_merged, detail_responvice#municipality, address, Locality, location, location_parsed, table, details_structured, zip |
| Coordinates | Latitude, lat, Longitude, lon |
| Floor | Floor, Floor_merged, Piano, Stockwerk, Étage, detail_responsive#floor, table, details_structured |
| Gross return | Gross return, table, details_structured |
| Plot area | Grundstücksfläche, Plot area, Plot_area_merged, Superficie del terreno, Surface de terrain, detail_responsive#surface_property, table, details_structured |
| Living space | Living space, Living_space_merged, Superficie abitabile, Surface habitable, Wohnfläche, detail_responsive#surface_living, description, table, details_structured |
| Environment | NoisePollutionRailwayL, NoisePollutionRailwayM, NoisePollutionRailwayS, NoisePollutionRoadL, NoisePollutionRoadM, NoisePollutionRoadS, PopulationDensityL, PopulationDensityM, PopulationDensityS, RiversAndLakesL, RiversAndLakesM, RiversAndLakesS, ForestDensityL, ForestDensityM, ForestDensityS, WorkplaceDensityL, WorkplaceDensityM, WorkplaceDensityS, distanceToTrainStation |
| gde | gde_area_agriculture_percentage, gde_area_forest_percentage, gde_area_nonproductive_percentage, gde_area_settlement_percentage, gde_average_house_hold, gde_empty_apartments, gde_foreigners_percentage, gde_new_homes_per_1000, gde_politics_bdp, gde_politics_cvp, gde_politics_evp, gde_politics_fdp, gde_politics_glp, gde_politics_gps, gde_politics_pda, gde_politics_rights, gde_politics_sp, gde_politics_svp, gde_pop_per_km1, gde_population, gde_private_apartments, gde_social_help_quota, gde_tax, gde_workers_sector1, gde_workers_sector2, gde_workers_sector3, gde_workers_total |
| Price | price, price_cleaned, description, details_structured |
| Rooms | rooms, description, details_structured |
| Type | type |
Many features are contained in multiple columns. This and this notebook explores how they can be aggregated.
# Import modules
import pandas as pd
import numpy as np
import sweetviz as sv
df = pd.read_csv('https://raw.githubusercontent.com/Immobilienrechner-Challenge/data/main/immoscout_cleaned_lat_lon_fixed_v9.csv', low_memory=False)
df.shape
(13378, 108)
# reorder columns alphabetically and show sweetviz report
df = df.reindex(sorted(df.columns), axis=1)
sweet_report = sv.analyze(df)
sweet_report.show_notebook()
| | [ 0%] 00:00 -> (? left)
Together with this analysis, which explores the contents of the columns description, detailed_description, table, details_structured and details, and the above overview we've identified the following features this dataset provides information on
The feature availability represents information on when the object is available. Data on it is present in the following columns of the dataset:
AvailabilityAvailability_mergedDisponibilitàDisponibilitéVerfügbarkeitdetail_responsive#available_fromdetails_structuredtableEither a date or a string like «On request» or «Immediately».
The dataset provides information on the municipality, zip code, canton, street and the number of an object. Address data can be found in the following columns:
CommuneComuneGemeindeMunicipalityMunicipality_mergeddetail_responvice#municipalityaddress (with RegEx)Localitylocation (with RegEx)location_parsed (with RegEx)table (with RegEx)details_structured (with RegEx)String with or without the canton specified to avoid confusion.
location¶df['location'].head()
0 5023 Biberstein, AG 1 Buhldenstrasse 8d5023 Biberstein, AG 2 5022 Rombach, AG 3 Buhaldenstrasse 8A5023 Biberstein, AG 4 5022 Rombach, AG Name: location, dtype: object
from_location = df['location'].str.extract(r"\d (.+?),")
from_location.head()
| 0 | |
|---|---|
| 0 | Biberstein |
| 1 | Biberstein |
| 2 | Rombach |
| 3 | Biberstein |
| 4 | Rombach |
address¶df['address'].head()
0 5023 Biberstein, AG 1 Buhldenstrasse 8d, 5023 Biberstein, AG 2 5022 Rombach, AG 3 Buhaldenstrasse 8A, 5023 Biberstein, AG 4 5022 Rombach, AG Name: address, dtype: object
from_address = df['address'].str.extract(r"\d (.+?),")
from_address.head()
| 0 | |
|---|---|
| 0 | Biberstein |
| 1 | Biberstein |
| 2 | Rombach |
| 3 | Biberstein |
| 4 | Rombach |
location_parsed¶df['location_parsed'].head()
0 Strasse: plz:5023 Stadt: Biberstein Kanton: AG 1 Strasse:Buhldenstrasse 8d plz:5023 Stadt: Bib... 2 Strasse: plz:5022 Stadt: Rombach Kanton: AG 3 Strasse:Buhaldenstrasse 8A plz:5023 Stadt: Bi... 4 Strasse: plz:5022 Stadt: Rombach Kanton: AG Name: location_parsed, dtype: object
from_location_parsed = df['location_parsed'].str.extract(r"Stadt: (.+?) K")
from_location_parsed.head()
| 0 | |
|---|---|
| 0 | Biberstein |
| 1 | Biberstein |
| 2 | Rombach |
| 3 | Biberstein |
| 4 | Rombach |
table¶df['table'].head()
0 b <article class=####Box-cYFBPY hKrxoH####><h2... 1 b <article class=####Box-cYFBPY hKrxoH####><h2... 2 b <article class=####Box-cYFBPY hKrxoH####><h2... 3 b <article class=####Box-cYFBPY hKrxoH####><h2... 4 b <article class=####Box-cYFBPY hKrxoH####><h2... Name: table, dtype: object
from_table = df['table'].str.extract("Municipality.+?rJZBK####>(.+?)<\/td>")
from_table.head()
| 0 | |
|---|---|
| 0 | Biberstein |
| 1 | Biberstein |
| 2 | NaN |
| 3 | Biberstein |
| 4 | Küttigen |
details_structured¶df['details_structured'].head()
0 {'Municipality': 'Biberstein', 'Living space':...
1 {'Municipality': 'Biberstein', 'Living space':...
2 {'detail_responsive#municipality': 'Küttigen',...
3 {'Municipality': 'Biberstein', 'Living space':...
4 {'Municipality': 'Küttigen', 'Living space': '...
Name: details_structured, dtype: object
from_details_structured = df['details_structured'].str.extract("'Municipality': '(.+?)'")
from_details_structured.head()
| 0 | |
|---|---|
| 0 | Biberstein |
| 1 | Biberstein |
| 2 | NaN |
| 3 | Biberstein |
| 4 | Küttigen |
This feature contains data on which floor an object is located and is found in the following columns:
FloorFloor_mergedPianoStockwerkÉtagedetail_responsive#floortabledetails_structuredEither an integer and a string like 1. floor or as a string only: Ground floor.
The floor space is the part of the area that can be used in accordance with the respective purpose. Immoscout DE provides information on which rooms count as floor space and which do not.
Floor spaceFloor_space_mergedNutzflächeSuperficie utileSurface utiledetail_responsive#surface_usabletabledetails_structuredInteger
According to swisslife, there is no uniform and legally binding definition of how the living space must be measured. Nevertheless Wikipedia writes that this is a determining factor for the rent / purchase price.
Living spaceLiving_space_mergedSuperficie abitabileSurface habitableWohnflächedetail_responsive#surface_livingdescriptiontabledetails_structuredInteger
These are various measurements collected from the BfS(Bundesamt für Statistik) for a given municipality.
NoisePollutionRailwayLNoisePollutionRailwayMNoisePollutionRailwaySNoisePollutionRoadLNoisePollutionRoadMNoisePollutionRoadSPopulationDensityLPopulationDensityMPopulationDensitySRiversAndLakesLRiversAndLakesMRiversAndLakesSForestDensityLForestDensityMForestDensitySWorkplaceDensityLWorkplaceDensityMWorkplaceDensitySFloat percentage (0-1)
distanceToTrainStationFloat
The data in the following columns is not provided by immoscout24 and has been collected for the given municipalities:
gde_area_agriculture_percentagegde_area_forest_percentagegde_area_nonproductive_percentagegde_area_settlement_percentagegde_average_house_holdgde_empty_apartmentsgde_foreigners_percentagegde_new_homes_per_1000gde_politics_bdpgde_politics_cvpgde_politics_evpgde_politics_fdpgde_politics_glpgde_politics_gpsgde_politics_pdagde_politics_rightsgde_politics_spgde_politics_svpgde_pop_per_km1gde_populationgde_private_apartmentsgde_social_help_quotagde_taxgde_workers_sector1gde_workers_sector2gde_workers_sector3gde_workers_totalFloat
The following rooms are counted as whole rooms when renting or selling an apartment:
Officially, however, there is no definition of what counts as half a room, so this information can only be used as a guide. Bathroom, shower and kitchen are not counted as rooms. source
roomsdescriptiondetailsdetails_structuredFloat